Goto

Collaborating Authors

 Caldas Department


Hypothesis Testing for Quantifying LLM-Human Misalignment in Multiple Choice Settings

Hong, Harbin, Caldas, Sebastian, Leqi, Liu

arXiv.org Artificial Intelligence

As Large Language Models (LLMs) increasingly appear in social science research (e.g., economics and marketing), it becomes crucial to assess how well these models replicate human behavior. In this work, using hypothesis testing, we present a quantitative framework to assess the misalignment between LLM-simulated and actual human behaviors in multiple-choice survey settings. This framework allows us to determine in a principled way whether a specific language model can effectively simulate human opinions, decision-making, and general behaviors represented through multiple-choice options. We applied this framework to a popular language model for simulating people's opinions in various public surveys and found that this model is ill-suited for simulating the tested sub-populations (e.g., across different races, ages, and incomes) for contentious questions. This raises questions about the alignment of this language model with the tested populations, highlighting the need for new practices in using LLMs for social science studies beyond naive simulations of human subjects.


Proof-Carrying Neuro-Symbolic Code

Komendantskaya, Ekaterina

arXiv.org Artificial Intelligence

This invited paper introduces the concept of "proof-carrying neuro-symbolic code" and explains its meaning and value, from both the "neural" and the "symbolic" perspectives. The talk outlines the first successes and challenges that this new area of research faces. Keywords: Neural Networks Cyber-Physical System Verification Programming Languages Neuro-Symbolic Programs. 1 Neuro-Symbolic Proofs and Programs Proof Carrying Code is a long tradition within programming language research, broadly referring to methods that interleave verification with executable code, thus avoiding the inevitable discrepancies that arise when the code and the proofs are handled in different languages. Although the term was coined by Necula [50] almost three decades ago, with time, it grew to encompass any languages that are powerful enough to handle both the coding and the proving. Examples are dependently-typed (Agda, Idris, Coq/Rocq) and refinement-typed (F*, Liquid Haskell) languages.


Neural Network Verification is a Programming Language Challenge

Cordeiro, Lucas C., Daggitt, Matthew L., Girard-Satabin, Julien, Isac, Omri, Johnson, Taylor T., Katz, Guy, Komendantskaya, Ekaterina, Lemesle, Augustin, Manino, Edoardo, Šinkarovs, Artjoms, Wu, Haoze

arXiv.org Artificial Intelligence

Neural network verification is a new and rapidly developing field of research. So far, the main priority has been establishing efficient verification algorithms and tools, while proper support from the programming language perspective has been considered secondary or unimportant. Yet, there is mounting evidence that insights from the programming language community may make a difference in the future development of this domain. In this paper, we formulate neural network verification challenges as programming language challenges and suggest possible future solutions.


Predicting the Geothermal Gradient in Colombia: a Machine Learning Approach

Mejía-Fragoso, Juan Camilo, Florez, Manuel A., Bernal-Olaya, Rocío

arXiv.org Artificial Intelligence

Accurate determination of the geothermal gradient is critical for assessing the geothermal energy potential of a given region. Of particular interest is the case of Colombia, a country with abundant geothermal resources. A history of active oil and gas exploration and production has left drilled boreholes in different geological settings, providing direct measurements of the geothermal gradient. Unfortunately, large regions of the country where geothermal resources might exist lack such measurements. Indirect geophysical measurements are costly and difficult to perform at regional scales. Computational thermal models could be constructed, but they require very detailed knowledge of the underlying geology and uniform sampling of subsurface temperatures to be well-constrained. We present an alternative approach that leverages recent advances in supervised machine learning and available direct measurements to predict the geothermal gradient in regions where only global-scale geophysical datasets and course geological knowledge are available. We find that a Gradient Boosted Regression Tree algorithm yields optimal predictions and extensively validate the trained model. We show that predictions of our model are within 12% accuracy and that independent measurements performed by other authors agree well with our model. Finnally, we present a geothermal gradient map for Colombia that highlights regions where futher exploration and data collection should be performed.


NLP Verification: Towards a General Methodology for Certifying Robustness

Casadio, Marco, Dinkar, Tanvi, Komendantskaya, Ekaterina, Arnaboldi, Luca, Daggitt, Matthew L., Isac, Omri, Katz, Guy, Rieser, Verena, Lemon, Oliver

arXiv.org Artificial Intelligence

Deep neural networks have exhibited substantial success in the field of Natural Language Processing and ensuring their safety and reliability is crucial: there are safety critical contexts where such models must be robust to variability or attack, and give guarantees over their output. Unlike Computer Vision, NLP lacks a unified verification methodology and, despite recent advancements in literature, they are often light on the pragmatical issues of NLP verification. In this paper, we attempt to distil and evaluate general components of an NLP verification pipeline, that emerges from the progress in the field to date. Our contributions are two-fold. Firstly, we give a general (i.e. algorithm-independent) characterisation of verifiable subspaces that result from embedding sentences into continuous spaces. We identify, and give an effective method to deal with, the technical challenge of semantic generalisability of verified subspaces; and propose it as a standard metric in the NLP verification pipelines (alongside with the standard metrics of model accuracy and model verifiability). Secondly, we propose a general methodology to analyse the effect of the embedding gap -- a problem that refers to the discrepancy between verification of geometric subspaces, and the semantic meaning of sentences which the geometric subspaces are supposed to represent. In extreme cases, poor choices in embedding of sentences may invalidate verification results. We propose a number of practical NLP methods that can help to quantify the effects of the embedding gap; and in particular we propose the metric of falsifiability of semantic subspaces as another fundamental metric to be reported as part of the NLP verification pipeline. We believe that together these general principles pave the way towards a more consolidated and effective development of this new domain.


NLP for Maternal Healthcare: Perspectives and Guiding Principles in the Age of LLMs

Antoniak, Maria, Naik, Aakanksha, Alvarado, Carla S., Wang, Lucy Lu, Chen, Irene Y.

arXiv.org Artificial Intelligence

Ethical frameworks for the use of natural language processing (NLP) are urgently needed to shape how large language models (LLMs) and similar tools are used for healthcare applications. Healthcare faces existing challenges including the balance of power in clinician-patient relationships, systemic health disparities, historical injustices, and economic constraints. Drawing directly from the voices of those most affected, and focusing on a case study of a specific healthcare setting, we propose a set of guiding principles for the use of NLP in maternal healthcare. We led an interactive session centered on an LLM-based chatbot demonstration during a full-day workshop with 39 participants, and additionally surveyed 30 healthcare workers and 30 birthing people about their values, needs, and perceptions of NLP tools in the context of maternal health. We conducted quantitative and qualitative analyses of the survey results and interactive discussions to consolidate our findings into a set of guiding principles. We propose nine principles for ethical use of NLP for maternal healthcare, grouped into three themes: (i) recognizing contextual significance (ii) holistic measurements, and (iii) who/what is valued. For each principle, we describe its underlying rationale and provide practical advice. This set of principles can provide a methodological pattern for other researchers and serve as a resource to practitioners working on maternal health and other healthcare fields to emphasize the importance of technical nuance, historical context, and inclusive design when developing NLP technologies for clinical use.


Counterfactuals Modulo Temporal Logics

Finkbeiner, Bernd, Siber, Julian

arXiv.org Artificial Intelligence

Evaluating counterfactual statements is a fundamental problem for many approaches to causal reasoning [40]. Such reasoning can for instance be used to explain erroneous system behavior with a counterfactual statement such as'If the input i at the first position of the observed computation π had not been enabled then the system would not have reached an error e.' which can be formalized using the counterfactual operator and the temporal operator F: π ( i) ( Fe). Since the foundational work by Lewis[38] on the formal semantics of counterfactual conditionals, many applications for counterfactuals [28, 5, 34, 46, 3, 15] and some theoretical results on the decidability of the original theory [37] and related notions [20, 2] have been discovered. Still, certain domains have proven elusive for a long time, for instance, theories involving higher-order reasoning and an infinite number of variables. In this paper, we consider a domain that combines both of these aspects: temporal reasoning over infinite sequences. In particular, we consider counterfactual conditionals that relate two properties expressed in temporal logics, such as the temporal property F e from the introductory example. Temporal logics are used ubiquitously as high-level specifications for verification [21, 4] and synthesis [22, 41], and recently have also found use in specifying reinforcement learning tasks [32, 39]. Our work lifts the language of counterfactual reasoning to similar high-level expressions.


How banks and fintech are using artificial intelligence to deliver loans - The Goa Sportlight

#artificialintelligence

Financial technology services are increasingly large and diverse, not only representing a change for users, but also for banks that have had to adapt as new developments allow greater knowledge of the market and customers. Faced with this situation, they have launched in Colombia a platform that will use advanced artificial intelligence functions to generate a credit score for each person and allow financial institutions to identify potential clients. The new system is developed by the fintech Yabx which specializes in enabling credit for unbanked sectors, so thanks to an alliance it will base its data on Telecom's Telecommunications system in association with Claro, therefore It will allow the identification of new clients not recognized by the criteria of traditional banking. The platform will use machine-learning algorithms (artificial intelligence machine learning) to provide a credit score and other products that can be offered to banks or other fintech companies that want to improve their abilities to acquire and qualify customers whose applications to banks traditional are rejected. Thanks to the association with Claro, one of the largest telecommunications networks in the country, the new system will be able to cover around 67% of Colombian adults, in addition, it will allow credit institutions to reduce their rejection rates by up to 40% by take into account factors that are not normally observed.



Scalable Prototype Selection by Genetic Algorithms and Hashing

Plasencia-Calaña, Yenisel, Orozco-Alzate, Mauricio, Méndez-Vázquez, Heydi, García-Reyes, Edel, Duin, Robert P. W.

arXiv.org Machine Learning

Classification in the dissimilarity space has become a very active research area since it provides a possibility to learn from data given in the form of pairwise non-metric dissimilarities, which otherwise would be difficult to cope with. The selection of prototypes is a key step for the further creation of the space. However, despite previous efforts to find good prototypes, how to select the best representation set remains an open issue. In this paper we proposed scalable methods to select the set of prototypes out of very large datasets. The methods are based on genetic algorithms, dissimilarity-based hashing, and two different unsupervised and supervised scalable criteria. The unsupervised criterion is based on the Minimum Spanning Tree of the graph created by the prototypes as nodes and the dissimilarities as edges. The supervised criterion is based on counting matching labels of objects and their closest prototypes. The suitability of these type of algorithms is analyzed for the specific case of dissimilarity representations. The experimental results showed that the methods select good prototypes taking advantage of the large datasets, and they do so at low runtimes. Preprint submitted to Elsevier December 27, 2017 1. Introduction The vector space representation is a common option to represent the data for learning tasks since many statistical techniques are applicable for this kind of representation. However, there is an increasing number of real-world problems which are not vectorial. Instead, the data are given in terms of pairwise dissimilarities which may be non-Euclidean and even non-metric.